Comma checking in Danish
نویسنده
چکیده
This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect commas in Danish. Trained on a part-of-speech tagged corpus of 600,000 words, the system identifies incorrect commas with a precision of 91% and a recall of 77%. The system was developed by randomly inserting commas in a text, which were tagged as incorrect, while the original commas were tagged as correct. Then the tagger was trained to recognize the contexts in which incorrect commas occur. In what follows, we first describe the corpora and tag sets used in this research, and give background on the Brill Tagger. We then describe the methodology for learning to identify comma errors, and then we examine some of the principles that the system learned to identify comma errors. Finally, test results are presented, and we discuss plans for future research. The method used here is quite general, and could be applied fairly directly to a wide range of grammar checking problems, in Danish or other languages.
منابع مشابه
Transformation-Based Learning of Danish Grammar Correction
We describe a technique for using the Brill Tagger to learn to identify grammar errors. We have applied this technique to two types of Danish grammar errors: incorrect commas, and incorrect article-noun agreement. The system identi es comma errors with a precision of 91%, while agreement errors are identi ed with 95% precision, with many of the system errors resulting from de ciencies in the ta...
متن کاملAutomated Deployment of Argumentation Protocols
The objective of this paper is to try to fill the gap between: argumentation, electronic institutions and protocols by using a combination of automated synthesis and model checking methods. More precisely, this paper proposes a means of moving rapidly from argument specification to protocol implementation, using an extension of the Argument Interchange Format as the specification language and t...
متن کاملVarieties of comma-free codes
New varieties of comma-free codes CFC of length 3 on the 4-letter alphabet are defined and analysed: self-complementary comma-free codes (CCFC), C3 comma-free codes (C3CFC), C3 self-complementary comma-free codes (C3CCFC), selfcomplementary maximal comma-free codes (CMCFC), C3 maximal comma-free codes (C3MCFC) and C3 self-complementary maximal comma-free codes (C3CMCFC). New properties with wor...
متن کاملK-Comma Codes and Their Generalizations
In this paper, we introduce the notion of k-comma codes a proper generalization of the notion of comma-free codes. For a given positive integer k, a k-comma code is a set L over an alphabet Σ with the property that LΣL ∩ ΣLΣ = ∅. Informally, in a k-comma code, no codeword can be a subword of the catenation of two other codewords separated by a “comma” of length k. A k-comma code is indeed a cod...
متن کاملEvaluation of the Two Methods for Thinning in Oak Plantation based on Ecological Capability (Case staudy: Neka area, Mazadaran Province)
The study was aimed to assess of the Danish and Swiss methods of thinning in 20 years old plantations of Chestnut leaved Oak (Quercus castaneifolia C. A. Mey.) in terms of quantitative and qualitative characteristics of trees, natural regeneration, plant and soil invertebrate diversity. The study area is located in Neka forests, east of Mazandaran province in the Caspian region. This research w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001